Midland County
Supplementary Material for DeWave: Discrete Encoding of EEG Waves for EEG to Text Translation
In this material, we will give more technical details as well as additional experiments to support the main paper. The overview of the proposed framework, DeWave, is illustrated in Figure 6. The dataset is split into training (80%), development (10%), and testing (10%) sets, comprising 10,874, 1,387, and 1,387 unique sentences, respectively, with no overlap. We release our implementation code through GitHub to contribute to this area. Section 3.3, where a 6-layer CNN encoder slides through the whole wave and gets the embedding The codex encoder shares the same structure with word-level features.
- North America > United States > California (0.05)
- North America > United States > Texas > Travis County > Austin (0.05)
- North America > United States > Florida > Dade County (0.04)
- (9 more...)
- North America > United States > California (0.05)
- North America > United States > Texas > Travis County > Austin (0.05)
- South America > Venezuela > Capital District > Caracas (0.04)
- (8 more...)
TNG-CLIP:Training-Time Negation Data Generation for Negation Awareness of CLIP
Cai, Yuliang, Thomason, Jesse, Rostami, Mohammad
Vision-language models (VLMs), such as CLIP, have demonstrated strong performance across a range of downstream tasks. However, CLIP is still limited in negation understanding: the ability to recognize the absence or exclusion of a concept. Existing methods address the problem by using a large language model (LLM) to generate large-scale data of image captions containing negation for further fine-tuning CLIP. However, these methods are both time- and compute-intensive, and their evaluations are typically restricted to image-text matching tasks. To expand the horizon, we (1) introduce a training-time negation data generation pipeline such that negation captions are generated during the training stage, which only increases 2.5% extra training time, and (2) we propose the first benchmark, Neg-TtoI, for evaluating text-to-image generation models on prompts containing negation, assessing model's ability to produce semantically accurate images. We show that our proposed method, TNG-CLIP, achieves SOTA performance on diverse negation benchmarks of image-to-text matching, text-to-image retrieval, and image generation.
- North America > United States > California (0.14)
- North America > United States > Texas > Midland County (0.04)
- Asia (0.04)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder
Wang, Jiaqi, Song, Zhenxi, Ma, Zhengyu, Qiu, Xipeng, Zhang, Min, Zhang, Zhiguo
Reconstructing natural language from non-invasive electroencephalography (EEG) holds great promise as a language decoding technology for brain-computer interfaces (BCIs). However, EEG-based language decoding is still in its nascent stages, facing several technical issues such as: 1) Absence of a hybrid strategy that can effectively integrate cross-modality (between EEG and text) self-learning with intra-modality self-reconstruction of EEG features or textual sequences; 2) Under-utilization of large language models (LLMs) to enhance EEG-based language decoding. To address above issues, we propose the Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text through a dedicated multi-stream encoder. Furthermore, we develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations), which leverages pre-trained modules alongside the EEG stream from CET-MAE and further enables an LLM (specifically BART) to decode text from EEG sequences. Comprehensive experiments conducted on the popular text-evoked EEG database, ZuCo, demonstrate the superiority of E2T-PTR, which outperforms the state-of-the-art in ROUGE-1 F1 and BLEU-4 scores by 8.34% and 32.21%, respectively. These results indicate significant advancements in the field and underscores the proposed framework's potential to enable more powerful and widespread BCI applications.
- North America > United States > District of Columbia (0.05)
- North America > United States > Massachusetts > Norfolk County > Quincy (0.04)
- North America > United States > Texas > Midland County > Midland (0.04)
- (6 more...)
- Education > Educational Setting (0.68)
- Health & Medicine > Therapeutic Area (0.66)
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback
Sun, Jiao, Fu, Deqing, Hu, Yushi, Wang, Su, Rassin, Royi, Juan, Da-Cheng, Alon, Dana, Herrmann, Charles, van Steenkiste, Sjoerd, Krishna, Ranjay, Rashtchian, Cyrus
Despite their wide-spread success, Text-to-Image models (T2I) still struggle to produce images that are both aesthetically pleasing and faithful to the user's input text. We introduce DreamSync, a model-agnostic training algorithm by design that improves T2I models to be faithful to the text input. DreamSync builds off a recent insight from TIFA's evaluation framework -- that large vision-language models (VLMs) can effectively identify the fine-grained discrepancies between generated images and the text inputs. DreamSync uses this insight to train T2I models without any labeled data; it improves T2I models using its own generations. First, it prompts the model to generate several candidate images for a given input text. Then, it uses two VLMs to select the best generation: a Visual Question Answering model that measures the alignment of generated images to the text, and another that measures the generation's aesthetic quality. After selection, we use LoRA to iteratively finetune the T2I model to guide its generation towards the selected best generations. DreamSync does not need any additional human annotation. model architecture changes, or reinforcement learning. Despite its simplicity, DreamSync improves both the semantic alignment and aesthetic appeal of two diffusion-based T2I models, evidenced by multiple benchmarks (+1.7% on TIFA, +2.9% on DSG1K, +3.4% on VILA aesthetic) and human evaluation.
- North America > United States > California (0.14)
- North America > United States > Texas > Stonewall County (0.04)
- North America > United States > Texas > Midland County (0.04)
- (3 more...)
Efficient Graphics Representation with Differentiable Indirection
Datta, Sayantan, Marshall, Carl, Nowrouzezahrai, Derek, Dong, Zhao, Li, Zhengqin
We introduce differentiable indirection -- a novel learned primitive that employs differentiable multi-scale lookup tables as an effective substitute for traditional compute and data operations across the graphics pipeline. We demonstrate its flexibility on a number of graphics tasks, i.e., geometric and image representation, texture mapping, shading, and radiance field representation. In all cases, differentiable indirection seamlessly integrates into existing architectures, trains rapidly, and yields both versatile and efficient results.
- North America > Canada > Quebec > Montreal (0.28)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Oceania > Australia > New South Wales > Sydney (0.05)
- (10 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Vision (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding
Amrani, Hamza, Micucci, Daniela, Napoletano, Paolo
Previous research has demonstrated the potential of using pre-trained language models for decoding open vocabulary Electroencephalography (EEG) signals captured through a non-invasive Brain-Computer Interface (BCI). However, the impact of embedding EEG signals in the context of language models and the effect of subjectivity, remain unexplored, leading to uncertainty about the best approach to enhance decoding performance. Additionally, current evaluation metrics used to assess decoding effectiveness are predominantly syntactic and do not provide insights into the comprehensibility of the decoded output for human understanding. We present an end-to-end deep learning framework for non-invasive brain recordings that brings modern representational learning approaches to neuroscience. Our proposal introduces the following innovations: 1) an end-to-end deep learning architecture for open vocabulary EEG decoding, incorporating a subject-dependent representation learning module for raw EEG encoding, a BART language model, and a GPT-4 sentence refinement module; 2) a more comprehensive sentence-level evaluation metric based on the BERTScore; 3) an ablation study that analyses the contributions of each module within our proposal, providing valuable insights for future research. We evaluate our approach on two publicly available datasets, ZuCo v1.0 and v2.0, comprising EEG recordings of 30 subjects engaged in natural reading tasks. Our model achieves a BLEU-1 score of 42.75%, a ROUGE-1-F of 33.28%, and a BERTScore-F of 53.86%, outperforming the previous state-of-the-art methods by 3.38%, 8.43%, and 6.31%, respectively.
- North America > United States > Florida > Dade County (0.04)
- North America > United States > Texas > Midland County > Midland (0.04)
- North America > United States > Florida > Miami-Dade County (0.04)
- (5 more...)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
Ji, Yatai, Wang, Junjie, Gong, Yuan, Zhang, Lin, Zhu, Yanru, Wang, Hongfa, Zhang, Jiaxing, Sakai, Tetsuya, Yang, Yujiu
Multimodal semantic understanding often has to deal with uncertainty, which means the obtained messages tend to refer to multiple targets. Such uncertainty is problematic for our interpretation, including inter- and intra-modal uncertainty. Little effort has studied the modeling of this uncertainty, particularly in pre-training on unlabeled datasets and fine-tuning in task-specific downstream datasets. In this paper, we project the representations of all modalities as probabilistic distributions via a Probability Distribution Encoder (PDE) by utilizing sequence-level interactions. Compared to the existing deterministic methods, such uncertainty modeling can convey richer multimodal semantic information and more complex relationships. Furthermore, we integrate uncertainty modeling with popular pre-training frameworks and propose suitable pre-training tasks: Distribution-based Vision-Language Contrastive learning (D-VLC), Distribution-based Masked Language Modeling (D-MLM), and Distribution-based Image-Text Matching (D-ITM). The fine-tuned models are applied to challenging downstream tasks, including image-text retrieval, visual question answering, visual reasoning, and visual entailment, and achieve state-of-the-art results.
- North America > United States > Texas > Midland County (0.04)
- Europe > Spain > Castile and León > Valladolid Province > Valladolid (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Research Report > New Finding (0.56)
- Research Report > Experimental Study (0.38)
Texas Sues Google Over Use of Facial Images
The Texas attorney general sued Alphabet Google on Thursday, alleging the search giant violated state laws by collecting biometric data on face and voice features without seeking the full consent of users. Texas alleged Google's data-collection practices stretched back to 2015 and affected millions of the state's residents, according to a complaint filed in state district court in Midland County, Texas. A weekly digest of tech reviews, headlines, columns and your questions answered by WSJ's Personal Tech gurus. "Google's indiscriminate collection of the personal information of Texans, including very sensitive information like biometric identifiers, will not be tolerated," Texas Attorney General Ken Paxton said. "I will continue to fight Big Tech to ensure the privacy and security of all Texans."
- North America > United States > Texas > Midland County (0.26)
- North America > United States > Illinois (0.08)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)